Mining Frequent Patterns Based on Data Characteristics
نویسندگان
چکیده
Frequent pattern mining is crucial part of association rule mining and other data mining tasks with many practical applications. Current popular algorithms for frequent pattern mining perform differently: some are good for dense databases while the others are ideal for sparse ones. In our previous research, we developed a new frequent pattern mining algorithm named FEM that runs fast on both sparse and dense databases. FEM combines the mining strategies of FP-growth and Eclat and given a userspecified threshold it adapts its mining behaviors to the data characteristics to efficiently find all short and long patterns from different database types. However, for best performance of FEM, an appropriate threshold value used to control the switching between its two mining tasks need to be selected by the user. In this paper, we present DFEM, an improved algorithm of FEM that automatically adopts a runtime dynamic threshold to better fit to the characteristics of the databases. The experimental results show that DFEM outperforms FEM and other popular frequent pattern mining algorithms including Apriori, Eclat, FP-growth on both sparse and dense databases.
منابع مشابه
High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملA Numerical Method for Frequent Patterns Mining
Frequent pattern mining is one of the active research themes in data mining. It plays an important role in all data mining tasks such as clustering, classification, prediction, and association analysis. Identifying all frequent patterns is the most time consuming process due to a massive number of patterns generated. A reasonable solution is identifying maximal frequent patterns which form the ...
متن کاملSmart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures
Association rule data mining is an important technique for finding important relationships in large datasets. Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining ...
متن کامل